AITopics | local sgd

Collaborating Authors

local sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck Cadambe

Neural Information Processing SystemsFeb-13-2026, 23:21:32 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, arxiv preprint arxiv, local update, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

c17028c9b6e0c5deaad29665d582284a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 23:21:17 GMT

algorithm, communication, experiment, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

AsymptoticBehaviorsofProjectedStochastic Approximation: AJumpDiffusionPerspective

Neural Information Processing SystemsFeb-12-2026, 09:52:01 GMT

In this paper we consider linearly constrained stochastic approximation problems with federated learning asaspecial case.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Virginia (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

cc06a6150b92e17dd3076a0f0f9d2af4-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 04:58:52 GMT

communication, communication round, sgd, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

MinibatchvsLocalSGDfor HeterogeneousDistributedLearning

Neural Information Processing SystemsFeb-8-2026, 06:36:49 GMT

Given the massive scale of many modern machine learning models and datasets, it has become important to develop better methods for distributed training.

artificial intelligence, machine learning, minibatch sgd, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

STEM: AStochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sampleand Communication Complexitiesfor Federated Learning

Neural Information Processing SystemsFeb-8-2026, 02:46:30 GMT

From rem C.10 includedSTEM requires O m (b

artificial intelligence, arxivpreprintarxiv, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.05)
North America > United States > Minnesota (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.66)

Add feedback

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Neural Information Processing SystemsDec-25-2025, 22:48:29 GMT

Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms. In this paper, we study local distributed SGD, where data is partitioned among computation nodes, and the computation nodes perform local updates with periodically exchanging the model among the workers to perform averaging. While local SGD is empirically shown to provide promising results, a theoretical understanding of its performance remains open. In this paper, we strengthen convergence analysis for local SGD, and show that local SGD can be far less expensive and applied far more generally than current theory suggests. Specifically, we show that for loss functions that satisfy the Polyak-Kojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker. This is in contrast with previous work which required higher number of communication rounds, as well as was limited to strongly convex loss functions, for a similar asymptotic performance. We also develop an adaptive synchronization scheme that provides a general condition for linear speed up.

local sgd, name change, tighter analysis and adaptive synchronization, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization

Neural Information Processing SystemsDec-25-2025, 03:30:58 GMT

Local SGD, a cornerstone algorithm in federated learning, is widely used in training deep neural networks and shown to have strong empirical performance. A theoretical understanding of such performance on nonconvex loss landscapes is currently lacking. Analysis of the global convergence of SGD is challenging, as the noise depends on the model parameters. Indeed, many works narrow their focus to GD and rely on injecting noise to enable convergence to the local or global optimum. When expanding the focus to local SGD, existing analyses in the nonconvex case can only guarantee finding stationary points or assume the neural network is overparameterized so as to guarantee convergence to the global minimum through neural tangent kernel analysis.

global convergence analysis, local sgd, two-layer neural network, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Communication-efficient SGD: From Local SGD to One-Shot Averaging

Neural Information Processing SystemsDec-24-2025, 22:12:47 GMT

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $N$ workers, who can take SGD steps and coordinate with a central server. While it is possible to obtain a linear reduction in the variance by averaging all the stochastic gradients at every step, this requires a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism.The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs $\Omega ( \sqrt{T})$ communications for $T$ local gradient steps in order for the error to scale proportionately to $1/(NT)$, this has been successively improved in a string of papers, with the state of the art requiring $\Omega \left( N \left( \mbox{ poly} (\log T) \right) \right)$ communications. In this paper, we suggest a Local SGD scheme that communicates less overall by communicating less frequently as the number of iterations grows. Our analysis shows that this can achieve an error that scales as $1/(NT)$ with a number of communications that is completely independent of $T$. In particular, we show that $\Omega(N)$ communications are sufficient. Empirical evidence suggests this bound is close to tight as we further show that $\sqrt{N}$ or $N^{3/4}$ communications fail to achieve linear speed-up in simulations. Moreover, we show that under mild assumptions, the main of which is twice differentiability on any neighborhood of the optimal solution, one-shot averaging which only uses a single round of communication can also achieve the optimal convergence rate asymptotically.

artificial intelligence, communication, machine learning, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.81)

Add feedback

Filters

Collaborating Authors

local sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

c17028c9b6e0c5deaad29665d582284a-AuthorFeedback.pdf

AsymptoticBehaviorsofProjectedStochastic Approximation: AJumpDiffusionPerspective

4dade38eae8c007f3a564b8ea820664a-Paper-Conference.pdf

cc06a6150b92e17dd3076a0f0f9d2af4-Paper.pdf

MinibatchvsLocalSGDfor HeterogeneousDistributedLearning

STEM: AStochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sampleand Communication Complexitiesfor Federated Learning

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization

Communication-efficient SGD: From Local SGD to One-Shot Averaging